AITopics | navigation plan

Collaborating Authors

navigation plan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WebDART: Dynamic Decomposition and Re-planning for Complex Web Tasks

Yang, Jingbo, Hou, Bairu, Wei, Wei, Chang, Shiyu, Bao, Yujia

arXiv.org Artificial IntelligenceOct-9-2025

Large-language-model (LLM) agents are becoming competent at straightforward web tasks, such as opening an item page or submitting a form, but still struggle with objectives that require long-horizon navigation, large-scale information extraction, and reasoning under constraints. DART, a general framework that enables a single LLM to handle such complex chores. DART (i) dynamically decomposes each objective into three focused sub-tasks--navigation, information extraction, and execution--so the model concentrates on one skill at a time, and (ii) continuously re-plans the decomposition as new webpages are revealed, taking advantage of newly discovered filters or shortcuts and avoiding redundant exploration. LLM-powered web agents have recently shown promising abilities in web navigation tasks (Drouin et al., 2024; He et al., 2024; Wei et al., 2025; Y ang et al., 2024a; Pan et al., 2024; Song et al., 2024). Benchmarks such as WebArena (Zhou et al., 2023) demonstrate that these agents achieve reasonable accuracy on simple objectives, highlighting their potential as general-purpose automation tools. However, when the objectives require more complex reasoning and multi-step exploration, the performance of these agents often collapses. As shown in Figure 1, on WebChoreArena (Miyai et al., 2025), a benchmark designed to test higher-complexity web tasks, agents powered by GPT -4o achieve only 8.0% accuracy on tasks across different web domains, far below the 46.6% accuracy on WebArena. This gap highlights a critical weakness of current worflows: while sufficient for simple goals, they are not well equipped for tasks demand multi-step reasoning, long-horizon navigation, and structured information processing. A closer examination reveals that the difficulty arises from cognitive overload. Complex tasks require agents to simultaneously navigate across multiple web pages, extract and track large amounts of information, and reason under constraints. Consider the following task from WebChore-Arena (Miyai et al., 2025): "T ell me the top 3 products with the highest number of reviews in Home Audio of Electronics within the price range of $1,000 to $9,999". As illustrated in Figure 1, product information is distributed across multiple nested web pages. Each page may contain tens of products with attributes such as price and number of reviews.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.06587

Genre:

Workflow (0.93)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Vision Language Models Can Parse Floor Plan Maps

DeFazio, David, Mehta, Hrudayangam, Blackburn, Jeremy, Zhang, Shiqi

arXiv.org Artificial IntelligenceSep-19-2024

Vision language models (VLMs) can simultaneously reason about images and texts to tackle many tasks, from visual question answering to image captioning. This paper focuses on map parsing, a novel task that is unexplored within the VLM context and particularly useful to mobile robots. Map parsing requires understanding not only the labels but also the geometric configurations of a map, i.e., what areas are like and how they are connected. To evaluate the performance of VLMs on map parsing, we prompt VLMs with floorplan maps to generate task plans for complex indoor navigation. Our results demonstrate the remarkable capability of VLMs in map parsing, with a success rate of 0.96 in tasks requiring a sequence of nine navigation actions, e.g., approaching and going through doors. Other than intuitive observations, e.g., VLMs do better in smaller maps and simpler navigation tasks, there was a very interesting observation that its performance drops in large open areas. We provide practical suggestions to address such challenges as validated by our experimental results. Webpage: https://shorturl.at/OUkEY

navigation, robot, vlm, (15 more...)

arXiv.org Artificial Intelligence

2409.12842

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.94)

Add feedback

RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation

Garg, Sourav, Rana, Krishan, Hosseinzadeh, Mehdi, Mares, Lachlan, Sünderhauf, Niko, Dayoub, Feras, Reid, Ian

arXiv.org Artificial IntelligenceMay-9-2024

Mapping is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on "image segments", which are semantically meaningful and open-vocabulary queryable, conferring several advantages over previous works based on pixel-level features. Unlike 3D scene graphs, we create a purely topological graph with segments as nodes, where edges are formed by a) associating segment-level descriptors between pairs of consecutive images and b) connecting neighboring segments within an image using their pixel centroids. This unveils a "continuous sense of a place", defined by inter-image persistence of segments along with their intra-image neighbours. It further enables us to represent and update segment-level descriptors through neighborhood aggregation using graph convolution layers, which improves robot localization based on segment-level retrieval. Using real-world data, we show how our proposed map representation can be used to i) generate navigation plans in the form of "hops over segments" and ii) search for target objects using natural language queries describing spatial relations of objects. Furthermore, we quantitatively analyze data association at the segment level, which underpins inter-image connectivity during mapping and segment-level localization when revisiting the same place. Finally, we show preliminary trials on segment-level `hopping' based zero-shot real-world navigation. Project page with supplementary details: oravus.github.io/RoboHop/

descriptor, navigation, node, (17 more...)

arXiv.org Artificial Intelligence

2405.05792

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > Western Australia > Perth (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Learning-based Preference Prediction for Constrained Multi-Criteria Path-Planning

Osanlou, Kevin, Guettier, Christophe, Bursuc, Andrei, Cazenave, Tristan, Jacopin, Eric

arXiv.org Artificial IntelligenceAug-2-2021

Learning-based methods are increasingly popular for search algorithms in single-criterion optimization problems. In contrast, for multiple-criteria optimization there are significantly fewer approaches despite the existence of numerous applications. Constrained path-planning for Autonomous Ground Vehicles (AGV) is one such application, where an AGV is typically deployed in disaster relief or search and rescue applications in off-road environments. The agent can be faced with the following dilemma : optimize a source-destination path according to a known criterion and an uncertain criterion under operational constraints. The known criterion is associated to the cost of the path, representing the distance. The uncertain criterion represents the feasibility of driving through the path without requiring human intervention. It depends on various external parameters such as the physics of the vehicle, the state of the explored terrains or weather conditions. In this work, we leverage knowledge acquired through offline simulations by training a neural network model to predict the uncertain criterion. We integrate this model inside a path-planner which can solve problems online. Finally, we conduct experiments on realistic AGV scenarios which illustrate that the proposed framework requires human intervention less frequently, trading for a limited increase in the path distance.

autonomous feasibility, criterion, neural network, (14 more...)

arXiv.org Artificial Intelligence

2108.0108

Country: Europe > France (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry: Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Topological Planning with Transformers for Vision-and-Language Navigation

Chen, Kevin, Chen, Junshen K., Chuang, Jo, Vázquez, Marynel, Savarese, Silvio

arXiv.org Artificial IntelligenceDec-9-2020

Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. forward, rotate) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.

agent, navigation, node, (16 more...)

arXiv.org Artificial Intelligence

2012.05292

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Germany (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tradeoff-Focused Contrastive Explanation for MDP Planning

Sukkerd, Roykrong, Simmons, Reid, Garlan, David

arXiv.org Artificial IntelligenceAug-2-2020

End-users' trust in automated agents is important as automated decision-making and planning is increasingly used in many aspects of people's lives. In real-world applications of planning, multiple optimization objectives are often involved. Thus, planning agents' decisions can involve complex tradeoffs among competing objectives. It can be difficult for the end-users to understand why an agent decides on a particular planning solution on the basis of its objective values. As a result, the users may not know whether the agent is making the right decisions, and may lack trust in it. In this work, we contribute an approach, based on contrastive explanation, that enables a multi-objective MDP planning agent to explain its decisions in a way that communicates its tradeoff rationale in terms of the domain-level concepts. We conduct a human subjects experiment to evaluate the effectiveness of our explanation approach in a mobile robot navigation domain. The results show that our approach significantly improves the users' understanding, and confidence in their understanding, of the tradeoff rationale of the planning agent.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2004.1296

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
(2 more...)

Add feedback

Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation

Zang, Xiaoxue, Pokle, Ashwini, Vázquez, Marynel, Chen, Kevin, Niebles, Juan Carlos, Soto, Alvaro, Savarese, Silvio

arXiv.org Artificial IntelligenceSep-24-2018

We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. The proposed model uses attention mechanisms to connect information from user instructions with a topological representation of the environment. To evaluate this model, we collected a new dataset for the translation problem containing 11,051 pairs of user instructions and navigation plans. Our results show that the proposed model outperforms baseline approaches on the new dataset. Overall, our work suggests that a topological map of the environment can serve as a relevant knowledge base for translating natural language instructions into a sequence of navigation behaviors.

instruction, navigation instruction, sequence, (15 more...)

arXiv.org Artificial Intelligence

1810.00663

Country:

South America > Chile (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)

Add feedback

High Level Path Planning with Uncertainty

Qi, Runping, Poole, David L.

arXiv.org Artificial IntelligenceMar-20-2013

For high level path planning, environments are usually modeled as distance graphs, and path planning problems are reduced to computing the shortest path in distance graphs. One major drawback of this modeling is the inability to model uncertainties, which are often encountered in practice. In this paper, a new tool, called U-yraph, is proposed for environment modeling. A U-graph is an extension of distance graphs with the ability to handle a kind of uncertainty. By modeling an uncertain environment as a U-graph, and a navigation problem as a Markovian decision process, we can precisely define a new optimality criterion for navigation plans, and more importantly, we can come up with a general algorithm for computing optimal plans for navigation tasks.

artificial intelligence, configuration, planning & scheduling, (17 more...)

arXiv.org Artificial Intelligence

1303.574

Country: North America > United States (0.68)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback

Learning to Interpret Natural Language Navigation Instructions from Observations

Chen, David L. (The University of Texas at Austin) | Mooney, Raymond J. (The University of Texas at Austin)

AAAI ConferencesAug-4-2011

The ability to understand natural-language instructions is critical to building intelligent agents that interact with humans. We present a system that learns to transform natural-language navigation instructions into executable formal plans. Given no prior linguistic knowledge, the system learns by simply observing how humans follow navigation instructions. The system is evaluated in three complex virtual indoor environments with numerous objects and landmarks. A previously collected realistic corpus of complex English navigation instructions for these environments is used for training and testing data. By using a learned lexicon to refine inferred plans and a supervised learner to induce a semantic parser, the system is able to automatically learn to correctly interpret a reasonable fraction of the complex instructions in this corpus.

artificial intelligence, instruction, natural language, (15 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.93)

Add feedback

Probabilistic Hybrid Action Models for Predicting Concurrent Percept-driven Robot Behavior

Beetz, M., Grosskreutz, H.

Journal of Artificial Intelligence ResearchDec-15-2005

Most autonomous robots are equipped with restricted, unreliable, and inaccurate sensors and effectors and operate in complex and dynamic environments. A successful approach to deal with the resulting uncertainty is the use of controllers that prescribe the robots' behavior in terms of concurrent reactive plans (CRPs) -- plans that specify how the robots are to react to sensory input in order to accomplish their jobs reliably (e.g., McDermott, 1992a; Beetz, 1999). Reactive plans are successfully used to produce situation specific behavior, to detect problems and recover from them automatically, and to recognize and exploit opportunities (Beetz et al., 2001). These kinds of behaviors are particularly important for autonomous robots that have only uncertain information about the world, act in dynamically changing environments, and are to accomplish complex tasks efficiently. Besides reliability and flexibility, foresight is another important capability of competent autonomous robots (McDermott, 1992a).

execution scenario, navigation plan, robot, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1565

AI Access Foundation

10432

Journal of Artificial Intelligence Research

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(14 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
(4 more...)

Add feedback